repeat statement
A Architecture Details
We provide additional architectural details here beyond those provided in the paper. In all models, the output layer consists of the computation of logits, followed by a softmax cross-entropy categorical loss term. Figure 6 provides the grammar. Figure 6: Grammar describing the generated programs comprising the dataset in this paper. Figure 8: The same programs as in Figure 7, with a single statement masked in each.